Goto

Collaborating Authors

 posterior concentration




Bayesian Dyadic Trees and Histograms for Regression

Stéphanie van der Pas, Veronika Rockova

Neural Information Processing Systems

Many machine learning tools for regression are based on recursive partitioning of the covariate space into smaller regions, where the regression function can be estimated locally. Among these, regression trees and their ensembles have demonstrated impressive empirical performance. In this work, we shed light on the machinery behind Bayesian variants of these methods.


Posterior Concentration for Sparse Deep Learning

Neural Information Processing Systems

We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time.




Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models

Li, Jipeng, Shen, Yanning

arXiv.org Artificial Intelligence

Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4-6%) and computation time (8-10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.


Review for NeurIPS paper: Consistent feature selection for analytic deep neural networks

Neural Information Processing Systems

Additional Feedback: Below please find my detailed comments: 1. This paper do not consider the feature selection under high dimensional setting. This paper consider an easier problem setting in which the total number of feature is fixed and does not scale with the number of data points. It is important to study a feature selection method under high dimensional setting, where the number of total feature may grows exponentially fast w.r.t. However, there are various previous actually already gives selection consistency result, for example [1,2,3]. And the selection consistency in [4, 5] can be easily obtained with several simple arguments built upon their analysis.


Posterior Concentration for Sparse Deep Learning

Neural Information Processing Systems

We introduce Spike-and-Slab Deep Learning (SS-DL), a fully Bayesian alternative to dropout for improving generalizability of deep ReLU networks. This new type of regularization enables provable recovery of smooth input-output maps with {\sl unknown} levels of smoothness. Indeed, we show that the posterior distribution concentrates at the near minimax rate for alpha-Holder smooth maps, performing as well as if we knew the smoothness level alpha ahead of time. These network attributes typically depend on unknown smoothness in order to be optimal. We obviate this constraint with the fully Bayes construction.


Reviews: Bayesian Dyadic Trees and Histograms for Regression

Neural Information Processing Systems

This paper analyses concentration rates (speed of posterior concentration) for Bayesian regression histograms and demonstrates that under certain conditions and priors, the posterior distribution concentrates around the true step regression function at the minimax rate. Different approximating functions are considered, starting from the set of step functions supported on equally sized intervals, up to more flexible functions supported on balanced partitions. The most important part of the paper is building the prior on the space of approximating functions. The paper is relatively clear and brings up an interesting first theoretical result regarding speed of posterior concentration for Bayesian regression histograms. The authors assume very simple conditions (one predictor, piecewise-constant functions), but this is necessary in order to get a first analysis.